Quantum learning algorithms for quantum measurements 
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We study quantum learning algorithms for quantum measurements. The optimal learning algo- 
rithm is derived for arbitrary von Neumann measurements in the case of training with one or two 
examples. The analysis of the case of three examples reveals that, differently from the learning of 
unitary gates, the optimal algorithm for learning of quantum measurements cannot be parallelized, 
and requires quantum memories for the storage of information. 
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I. INTRODUCTION 



The rapid development of an information technology 
in the last decades made the optimization of information 
processing tasks an important field of computer science. 
For example one needs to optimize database search, as 
well as tasks that emerged due to internet e.g. algorithms 
for anti-spam filters and internet search engines. The last 
two tasks are instances of the so called machine learning 
EJ, which can be defined as follows. Suppose we have a 
black box evaluating an unknown function / and we have 
access to N uses of it. However, after we lose the access to 
the black box we need to evaluate / on an input that was 
not previously available. Naturally any machine learning 
has two phases - training and retrieving. The knowledge 
on / acquired in the training phase of the strategy is en- 
coded into a bit string that is later used as a program 
governing the retrieval phase. Obviously, if N is greater 
or equal to the number of possible inputs of / then the 
training part of the strategy can acquire complete knowl- 
edge of /. The same task, termed quantum learning, can 
be generalized to quantum theory. In this case the black 
box performs an unknown quantum transformation T. 
The result of the training phase is a quantum state ip-r-- 
This state has to be kept in the quantum memory until 
the retrieving phase, where it enters together with the 
unknown state p into the retrieving channel that mimics 
the action of T on p. We can immediately observe sub- 
stantial difference to machine learning. Even for finite 
dimensional quantum systems there does not exist a fi- 
nite N for which the quantum learning works perfectly. 
Indeed, even if the training part of the strategy would 
encode full information about T into the finite dimen- 
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sional state ipj-, the no programming theorem of Nielsen 
prevents us to retrieve the transformation perfectly. 

A closely related problem to quantum learning was 
studied as a quantum version of pattern recognition algo- 
rithms [|| ||. For the case of quantum learning of chan- 
nels, the first analysis was published in Ref. ||, where 
very simple processing techniques were studied for learn- 
ing of particular gates like the Grover oracle || or the 
discrete Fourier transform. Learning of unitary black 
boxes was analyzed in Ref. Q. Surprisingly, it turns 
out that the task of quantum learning of unitaries can be 
fully parallelized, which means that the optimal training 
phase is achieved by applying the N uses of the black 
box on the fixed entangled state. Another surprising fea- 
ture of the aforementioned training phase is that it is 
an optimal estimation procedure and hence the quantum 
memory can be replaced by a classical storage of the es- 
timated unitary black box. The simulation then consists 
in the conditional application of the gate corresponding 
to the estimated parameters. 

In the present paper we will consider the case in which 
the black box to be learnt is a device performing a 
Von Neumann measurement, namely a projective non- 
degenerate Positive Operator Valued Measure (POVM) 
E := {Ei}. We will show that for measuring black boxes 
the surprising features of optimal learning of unitary 
black boxes disappear. In particular, we will show that 
the optimal algorithm cannot be parallelized, leading to 
a training phase that lasts an increasing time versus the 
number of examples. Moreover, the optimal training docs 
not consist of optimal estimation, thus requiring a coher- 
ent quantum memory for the storage of the learnt mea- 
surement. 

The paper is organized as follows. In Sec || we review 
some notation and preliminary concepts used in the anal- 
ysis. In Sec. Ill we expose the mathematical formulation 
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of the general problem of optimal learning in mathemati- 
cal terms. In Sec. |y|the problem is simplified exploiting 
all the symmetries that can be useful. The problem is 
then solved in Sec. [v] for the cases N = 1, N = 2 and 
N = 3. Finally, the paper is closed by concluding re- 
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marks in Sec. VI 



II. PRELIMINARY CONCEPTS 

In this section we review some notions of the theory 
of quantum networks [p[-pO[. The main feature of this 
approach is the representation of quantum networks in 
terms of suitably normalized positive operators. 

The nodes of a quantum network TZ are elementary 
boxes linked by wires. Elementary boxes represent state 
preparations, channels, quantum operations, or effects. 
The most general pictorial representation of a quantum 
network is a directed acyclic graph, where the vertices 
represent elementary boxes and the arrows represent the 
quantum systems traveling within the network in the di- 
rection induced by the input-output relation. 

By stretching the connections in the graph we can give 
the quantum network the shape of a comb, i.e. any 
quantum network TZ is equivalent to a sequence of N 
quantum operations {T{\f =l with some unconnected in- 
put and output subsystems, as follows 

1 2 3 2N - 



2 2N- 1 



To 



T 



N 



(1) 



If all the TV quantum operations are trace preserving (i.e. 
they are quantum channels) 72. is a deterministic quan- 
tum network, otherwise TZ is a probabilistic quantum net- 
work. The ordering of the teeth is induced by the causal 
order defined by the flow of quantum information inside 
the quantum network. Referring to the scheme in Eq. (|l|) 
we label each wire with an integer number j : accordingly, 
the Hilbert space of the system represented by wire j is 
denoted as Jtfj. 

Since a quantum network 1Z is a concatenation of quan- 
tum operations it can be considered as a quantum oper- 
ation itself 1Z : C(Jf even ) — > C(Jf dd) where we defined 

ing so, it is possible to define the Choi-Jamiolkowsky op- 
erator of a quantum network as 



R :=TZ® l{\u){u)\) 

R € C(jie even <8 .^Codd) 



(2) 



is the comb of the reduced circuit HP°l obtained by dis- 
carding the last N — k teeth. It is relevant to stress that 
each positive operator that satisfies Eq. (|3j) corresponds 
to a valid deterministic quantum network. This gives us 
a correspondence between the set of positive operators 
satisfying Eq. (||) and the set of deterministic quantum 
networks. 

On the other hand, the Choi-Jamiolkowsky operator 
of a probabilistic quantum network 1Z, must satisfy 



< R < S 



(4) 



where S is a Choi-Jamiolkowsky operator of a determin- 
istic quantum network. An important theorem proves 
H that any positive operator, upon suitable rescaling, 
represents a probabilistic quantum network. 

Two quantum networks TZi and TZ2 can be connected 
by linking input wires of one network with output wires of 
the other network thus forming the network TZ\*TZi . The 
Choi-Jamiolkowsky operator of the composite network 
1Z\ * TZi is the link product of the operators R\ and R2 
which is defined as follows: 



Rx*R 2 



(5) 



where 9jc denotes the partial transposition (with respect 
to a fixed orthonormal basis) over the Hilbert space K, 
of the connected wires and Tr^ denotes the partial trace 
over K. 



A. Generalized Instrument 

The aim of this paper is to study quantum networks 
that replicate quantum measurements. A generalized 
quantum instrument is set of probabilistic quantum net- 
works TZ := {TZi} such that the set R = {Ri} of the 
Choi-Jamiolkowsky operators of its components satisfies 
the following condition: 



^ Ri '■= Rn 



(0) 



R < 



where Rq corresponds to a deterministic quantum net- 
work. Every probabilistic quantum network belongs to 
some generalized quantum instrument, and viceversa ev- 
ery generalized quantum instrument represents some set 
of probabilistic quantum networks. 



where I is the identity map and |w) € Jf® e 2 n , \ui) = 
J2 n \ n ) \ n ) ({l n )l i s an orthonormal basis of M' even ). The 
Choi-Jamiolkowsky operator of a quantum network is 
called quantum comb of the network. If TZ is a deter- 
ministic quantum network it is possible to prove that its 
Choi-Jamiolkowsky operator R must satisfy the recursive 
normalization constraint 



Tr 



2fc-l 



[R 



(fc)i _ 



2fc-2 



R 



(fc-i) 



k = l,...,N (3) 



where R-W -- 

with J4?even k 



R, RW = 1, RW e C(je oddk ® JT e „enJ 



III. THE OPTIMIZATION PROBLEM 

The learning scenario can be formulated as a quantum 
network that accepts N measurements into the open slots 
and works as a POVM on the remaining system. Here is 
a diagram representing the N = 2 scenario, 
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where the double wires carry the classical outcomes of 
the measurements. 

Since we consider the case where the unknown mea- 
surement is a projective non degenerate POVM E := 
{E\, . . . , Ed}, we can write its element Ei in the follow- 
ing form 

3« = |&><& | (7) 

where {\4>i)}t = i is an orthonormal basis of the Hilbert 
space T~L. All the POVM's of this kind can be generated 
by rotating a reference POVM E := by ele- 

ments of the group of unitary transformations §U(d) as 
follows 

E (c/) := C/EC/ f U 6 SU(d), (8) 

where {\i)} is a fixed orthonormal basis and U~EW de- 
notes the POVM with elements E\ U) := UE.W. Notice 
the slight abuse in the definition of E^), due to the fact 
that there exists a stability subgroup S C SU(<f) such 
that for V G S one has V\i) = \i) for all i. The POVM 
E^ 7 ' is then labeled by the equivalence class [U] defined 
by the relation 

U ~ U' & U = U'V, V e S, (9) 

rather than by U . 

It is formally convenient to encode the classical out- 
come i of the POVM into a quantum system by prepar- 
ing the state |i) from a fixed orthonormal basis, which is 
the same for each POVM jllj . Within this framework the 
measurement device is actually described by the following 
measure-and-prepare quantum channel 8^ : C(J^) — > 

d 

S( u Hp)=J2^[4 U) p}\i)(il (10) 

i=l 

which measures the POVM E^ on the input state and 
in the case of outcome i prepares the state \i) from a fixed 
orthonormal basis on the output of the channel. The 
Choi-Jamiolkowski representation of the channel 8^ is 
the following 

d _ d 

EW = £ 8 E p = £ 8 U*\i)(i\U T , 

1=1 i=l 

(11) 

where V T denotes the transpose of V in the basis 
{|i)}f = X- The N uses of the measurement device are then 
represented by the tensor product E^)_ 1 2 n~2 ® ' ' ' ® 

e[q^ where the input and the output space of the fc-th 
use of the measurement device are denoted by 2k — 2 and 
2k — 1, respectively. We introduce the following notation 

N N 

:= (g) JT 2 fc-2, 3f& := ^ 2fe _!. (12) 
fe=l fe=i 



Since we want the learning network TL to behave as the 
POVM E^) upon insertion of the N uses of 8^ , we have 
that R is a generalized instrument where the element Ri 
describes the behaviour of the network when the output 
of the replicated measurement is i. The replicated POVM 
is then equal to 

= [R^Egl, 2N _ 2 * ••■ * E^)] T (13) 

where J^n denotes the input space of the replicated 
measurement. In this notation the normalization of the 
generalized instrument R becomes 

Tr 2k . 2 [R (k) ] = hk-3 ® i2 (fc_1) , k = l,...,N 

Rn = h N ,2N-i ® R (N \ R (0) = 1. (14) 

Our task is to find the learning network 7?. such that 
G^ 7 ' is as close as possible to E^ 7 '. In order to quantify 
the performances of the replicating network, we intro- 
duce the following quantity that measures the closeness 
between two POVM's P and Q 

~ d 

@(P, Q) := / # J2 I M P * - * M I' ( 15 ) 

^ 4=1 

The interpretation of 2t(P, Q) as a measure of " distance" 
between P and Q is provided by the following Lemma. 

Lemma 1 (Distance criterion for two POVMs) 

Let £ := {l,...,d} be a finite set of events and 
P C £(JT) and Q C £(JT) be two POVM's. Consider 
now the quantity @(P, Q) from equation Then the 

following properties hold: 

i) @(P, Q) > 0, 

ii) #(P, Q) = & Pi = Qi Vi, 

Hi) ^(P,Q) is convex with respect to POVMs. 

iv) 3>{UVU\UQ,U^) = 0(P,Q) for any unitary 

operator U . 

Proof. The non negativity of function f{x) = x 2 guar- 
antees the same property also for 9, which is a sum and 
an integral of the squares. For P] = Qi Vi it is obvious 
that @(P, Q) = 0. To prove the converse, it suffices to 
realize that Q) = implies (ip\ P t - Q t \ip) = V-0, 
which by polarization identity requires Pi = Qi Vi. In 
order to prove convexity, we need to show that 

0(P,AQ + (1-A)Q')< (16) 
< A0(P,Q) + (1-A)0(P,Q') 

holds for any POVM Q' and < A < 1. If we denote 
a t = {ip\Pi - Qi\ip), h = (ip\Pi - Q'i\i>) and utilize 
convexity of f(x) — x 2 , i.e. 

(Aa, + (1 - \)bi) 2 < \aj + (1 - \)b 2 

then the claim follows directly from the definition in Eq. 
(|l5|). Similarly, property iv) is obvious from the definition 
in Eq. @. ■ 
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Assuming that the unknown POVM E^ 7 ) is randomly 
drawn according to the Haar distribution, we choose the 
quantity: 



D := AU&(^ U \G {U) ) 



(17) 



as a figure of merit for the learning network. The quan- 
tity D clearly depends on the network H, and will be 
denoted by D \TV\. Our task is to find the optimal gener- 
alized instrument R, that minimizes D[7t\. 



IV. SYMMETRIES OF THE LEARNING 
NETWORK 

In this section we utilize the symmetries of the figure 
of merit ( |l7| ) to simplify the optimization problem. The 
first simplification relies on the fact that some wires of 
the network carry only classical information, representing 
the outcome of the measurement. 



Lemma 2 (Restriction to diagonal network) 

The optimal generalized instrument R, JV Ri = 
minimizing Eq. (ITX) can be chosen to satisfy: 



Rn 



(18) 



where j = (ji, . . . ,j N ), \j) 



\3ih 



<8> \j 



N/2N-1 



Ji? c \, < R[ j £ £(J%, ut <g) M\ n ), and JV * s a shorthand 

lnr e1...,,>=i- 



Proof. Let S be a generalized instrument correspond- 
ing to a quantum network <S. Let us define set of opera- 
tors R as 



(19) 



with R[ ■ :~ (j\ Si \ We can easily prove that R is a 
generalized instrument. Indeed, reminding Eq. (11), we 
have 

E^=EE^^'>®ij'M'I = 

i i j 

E0'l^b-)®li)(il = 



Sn*E^ 



(20) 



where the link is performed only on the space Jif c \ . The 
operator in Eq. (|20|) is the Choi-Jamiolkowski operator 
of a deterministic quantum network satisfying the same 
normalization conditions as Sn- Finally we show that S 
and R produce the same replicated POVM G^ u > when 



linked with the TV uses of E<V\ as follows 



* • • • * E {U) 

2N-1 2N-2 * * -^10 



I>"ld UL^ 9N )Si(\j) d v 9N li> in ) = 

3 

E(0'lin^ Ar )^(^ W b)in) = 



J 2W-1 2N-2 



? (U) 

J io ■ 



(21) 



It is clear from Eq. (^l|) that also for non diagonal 
networks 7?., the only relevant terms of the generalized 
instrument both for its normalization and for the figure 
of merit D [7t\ are 



R'i 



(j\c\Ri\j) 



(22) 



In the following we will use the above notation also for 
general networks. As a next step, we introduce a unitary 
symmetry of the learning network and we study its con- 
sequences on the form of the replicated POVM. We will 
show that restriction to covariant learning networks can 
be made without loss of generality. For this purpose we 
introduce the following lemma. 

Lemma 3 (Covariant networks) The optimal gener 
alized instrument R, JV Ri = Rn minimizing Eq. fiT~ 
can be chosen to satisfy 



[Ri,U* out ®U-® N ®I d 



0. 



(23) 



Then the replicated POVM for R enjoys the following 
property 



GW = u G« tft. 



(24) 



Proof. From an arbitrary learning network S by sym- 
metrization, we can define a covariant learning network 
H as follows 

Rr-= I &U(U* ®U® N ®I d )S t (U T ®rf® N ®I C \). (25) 



It is easy to verify that the set R defines a generalized 
instrument. Moreover, By the invariance of the Haar 
measure dU, the elements of R obey Eq. (p3|). First 
we show that the replicated POVM for the symmetrized 
instrument R enjoys the property (|24|). Indeed, eq. pi] ) 
provides the following formula for the replicated POVM 



(26) 



and exploiting the expression in Eq. ( |25| ) f° r ^iji we 
obtain 

= J dWWQ^ u ^W\ (27) 
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where Q^ 7 ' denotes the replicated POVM for the original 
learning network S. Eq. ( p4|) is a direct consequence of 
Eq. (|?]), which can be seen via suitable shift of the 
invariant Haar integration measure. We can now show 
that D[K] < D[S] as follows 

D[K] = J dU$)(E (u \J dWWQ( wtu) W*) 

< J dWAUS!{UW\wq, {w '' u) W^) 

< J dUdW3>(WE {u) W\WQ {u) W^) 
= D[S], 

where we used properties iii), iv) of S> and shifted the 
Haar invariant integration measure dU to d(W'U). I 

Another symmetry we introduce is related to the pos- 
sibility of relabeling the outcomes of a POVM. We shall 
denote by a the element of §d, the group of permuta- 
tions of d elements, and by T a the linear operator that 
permutes the elements of basis {\i)} according to this 
permutation, in formula T a \i) = \a{i)). Let us note that 
the complex conjugation and transposition are defined 
with respect to the basis so T a =T*. 

Lemma 4 (Relabeling symmetry) The optimal co- 
variant generalized instrument R, JV Ri = minimiz- 
ing Eq. ftT\ ) can be chosen to satisfy Eq. ( p3| ) and the 
following condition 



Ri — {lout <8> I'm 



<hn®T® N ), (28) 



where a(j) := (cr(ji), ■ 
cated POVM satisfies 



, ct(j'm))- Then the seed of repli- 



G« = TM I] Tl Va G S d . (29) 
where X ff denotes the ordered set with elements (X a )i := 



X, 



Proof. For a given covariant learning network S satis- 
fying Eq. (p3), let us define 



Since generalized instrument R inherits commutation 
property (|2^) from S (see definition (|30|)) it is obvious 
that the introduced permutation symmetry will not spoil 
the existing covariance from Eq. (^fr). Thus, it suf- 
fice to investigate how the seed of the replicated POVM 
changes, when we introduce permutation symmetry. 

Inserting definition (p0|) into Eq. (21) we find 



(I)\T 



m E T <? E ^-^ I in lj)in T o 



d\ 



d\ 

where we defined S, 



* E T M 



\Trp* 

I 1 a ■ 



(31) 



•= (j\Si\j), and we denoted by 
the POVM replicated by the original learning net- 
work <S. Transposing the last equation one can easily 
derive Eq. ( |29| ) by analyzing the conjugation with T^ 

As a next step, we show that D[R] < D[S]. Indeed, 



< 



< 



-y 
d\ ^ 



o&> d 



d^( E r J) ,Qr J) ) 



d^EW.qW) 



d\ ^ 

< D[S], 

where we utilized Eq. (||), convexity of @(E( U \GW), 
and the fact that ^(E^ 7 , Q?°) = ^(E^, QW) Vct G 
Sd- Finally, it is easy to prove that under the condition 
Eq. @, R % satisfy Eq. @. ■ 

The advantage of using the relabeling symmetry is the 
reduction of the number of independent parameters of the 
generalized quantum instrument. Combining Eq. (|22" 
with Eq. (Est) we have that 



T, 



/in 



■T, 



(30) 



-Ri = Rl 



(32) 



Let us now define the equivalence relation between 
strings i, j and as 



h3 ~ * ,J 



i = <r(i') A j = (33) 



where the last identity follows from the commutation re- 
lation ( p3| ) with t/ = Tj\ The generalized instrument R 
corresponds to a covariant quantum network 72., because 
it represents a convex combination of well-normalized co- 
variant networks. The quantum network 1Z- operationally 
corresponds to a random simultaneous relabeling of the 
outcomes of the inserted and replicated measurements by 
permutation a. Let us now prove Eq. (E9f). 



for some permutation a. Thanks to Eq. ( |32| ) there are 
only as many independent R[ j as there are equivalence 
classes among sequences In the simplest case of 

N = 1 and arbitrary dimension d > 2, there are only 
two classes, which we denote by xx and xy. The rea- 
son is that for any couple i,j there is either a permu- 
tation a such that a(i),o~(j) = 1,1 or a(i),o~(j) = 1,2, 
thus the classes are defined by the conditions i = j or 
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i 7^ j, respectively. For the case N = 2 the vector i,j 
has three components. Then there are four or five equiva- 
lence classes depending on the dimension d being d = 2 or 
d > 2, respectively. We denote these equivalence classes 
by xxx,xxy,xyx,xyy,xyz and the set of these elements 
by C| . In the general case, it is clear that the number of 
classes is given by the number of disjoint partitions of a 
set with cardinality N + 1, with number p of parts p < d 

It is useful to introduce the notation 



Rx, 



R'i a = R' 



•■3 



o-(i),<Ki)> 



(34) 



where (x, y) is a string of indices that represents one 
equivalence class. We will denote by L the set of equiv- 
alence classes L := {(x,y)} and we will use letters from 
the beginning of the alphabet to name arbitrary element 
in L in situations, when N is fixed. For example when 
N = l (a,b) g L = {(x,x),(x,y)}. 

As a consequence of lemma |^ the Eq. (|2^) can be writ- 
ten as 



[Ra;,y,U* oM ®U^}=0 



(35) 



By Schur's lemmas this implies the following structure 
for the operators R x . y 



Rx.y — P" 



(36) 



where v labels the irreducible representations in the 
Clebsch-Gordan series of U* ou t ® V® , and P v acts as 
the identity on the invariant subspaces M'n of the rep- 
resentations v, while r v x y acts on the multiplicity space 



m " of the same representation. 
In the simplest case TV = 1 we have 



RaM = P p rl b + P^ 



a, t>' 



where 



P p := 5 |w>H 



P q := {I-P'P) 



(38) 



and r p a b and r q a b are non-negative numbers due to R a ,b > 
0. In the case N = 2 we have two different decomposi- 
tions, depending on whether d = 2 or d > 2. In the 
former case, we have 



R — P° 



x.y 



p/3, 



x,yi 



(39) 



where r" is a positive 2x2 matrix, while is a 
non-negative real number. The projections P^ on the 
invariant spaces of the representation U* ® U U are the 
following 



P 13 = I ®P + - P a ®|+)(+|, (40) 



where = (|w))|m) ± |m>|w)))/[2(d ± 1)]*, and P+, 
P~ , are the projections onto the symmetric and antisym- 
metric subspace, respectively. When d > 2, on the other 
hand, we have 



Rx, 



PJ3 



x,y 



(41) 



where r° is a positive 2x2 matrix, while and y 
arc non-negative real numbers. The projections P^ on 
the invariant spaces of the representation U* <£> £7 <£>U are 
the following 

d 

P a ®\a)(b\ = Y.\^)^ a ^e{+,-} 



P^=/®P+-P Q <g>|+)(+|, 
PT = I ®P--P Q (8 |-)(-|. 



(42) 



The introduced symmetries have a deep influence on the 
structure of the replicated POVM as we show in the fol- 
lowing lemma. 



Lemma 5 The properties (18), Q23[ ) and \28\ ) induce the 
following structure of the replicated POVM's: 



Gf> =ALW|C/t + i_^ 7j 



(43) 



which can be seen as a random mixture of a perfect replica 
with a trivial measurement (i.e. a measurement produc- 
ing equiprobably any of the outcomes) with mixing coeffi- 
cient X, which is a function of R. 

Proof. Because of the property (|24|) it is sufficient 
to prove the statement for U = I. Since {G ( f > ) T = 
J2j 01 Ri,j \j) (see Eq. @) we have: 



(37) (k\GP\l) = (l\Y,(j\R itj \j)\k) = 



(44) 



Tr 



Tr 



Ri tj JdU U® N <g> U* \jk) (jl\ {U® N <g> £T) f 

3 

Y^ R tj (JdUU® N+1 \jl) (jk\U^ N+1 



where we used the property ( |23|) in the equality Q) and 
02N denotes the partial transpose on Jf?2N- Thanks to 
the Schur's lemmas we have 

Jdu u® N+1 \ji) (jk\ u^ N+1 = Y. pu ® °h,k, 



where 



Ol hk = Tr^[{P"®I m ")\3l) m\- 



We now notice that for k ^ I {j,k} and {j,l} are two 
different sets of indices and then there exists no permu- 
tation S such that (j,k\ S \j,l) ^ 0. Since any operator 
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of the form P v ® A , A S C m " can be written as a linear 
combination of permutations P v ®A = a n S„ we have 

Tr [(P" ® A) \jl) (jk\] = (jk\ a n S n \jl) = (45) 



for k ^ I. From Eq. 
Oj lk = and hence also 



it follows for Vfc ^ I that 



(46) 



Reminding Eq. (|29|) we have 

G ( P = T a G { pTl Va G § d s.t. er(i) - i. 
This implies 

(fc|Gf ) |fc) = (Z|Gf ) |/) V*,^i. 



(47) 



Eq. (E7p combined with Eq. (HJ) and (J29) finally leads 
to 



G 



(/) 



A|*><*| 



1 - A 



-I 



< A < 1. 



(48) 



where A is a function of R. Rewriting Eq. (^8|) one has 
A = (d(»|Gf|i)-l)/(d-l). (49) 

Let us note that (i| G^ |i) has the same value inde- 
pendently of i. ■ 

We have shown that the optimization can be restricted 
without lost of generality to learning networks obeying 
Eqs. (|l8|), ( |23| ) and (28). Further in the paper we al- 
ways assume that all the considered networks have the 
aforementioned properties. This allows us to express the 
figure of merit D[1Z] in a different form that will be more 
useful for calculations. The expression ( pig) ) for the repli- 
cated POVM allows us to write 

D[H] = J d?7^(E (c/ ',G (c/ ') = 

= (1-A) 2 ^|d(7d^ M (u\i)(i\tf-~I^j |V) = 
= (!- A ) 2 E / d^|(0|C/N)| 4 -^|(0|[/|^| 2 + l = 

i 

A 1 

(1-A) 2 



d(d+l) 



It is now clear that minimization of the figure of merit 
D[7t\ is equivalent to the maximization of parameter 
A = \[R], which is by Eq. (49) directly related to the 
maximization of the following quantity: 



The relation of D [1Z] and F [1Z] is given by the following 
equation 



d[h] 



d 2 



(51) 



The quantity -F[7Z.], which we actually need to maxi- 
mize can be finally written using Eqs. (|50|) , (|34|) , (|26|) as 



W = ^EE^Lt01in^li>inNLt 

• 3 

= J E n(x,y)(R XiV ), (52) 

where n(x,y) is the cardinality of the equivalence 
class denoted by the couple (x,y), and (R x ,y) = 
(i\ (j\ R\j \i) \ j) for any string i,j in the equivalence class 
denoted by (x, y). 



V. OPTIMIZATION 

In this section we derive optimal quantum learning of 
a von Neumann measurement for the scenarios analyzed 
in the following subsections. 



A. 1—7-1 Learning 

Suppose that today we are provided with a single use of 
a measurement device, and we need its replica to measure 
a state that will be prepared only tomorrow. Such a 
scenario is described by the following scheme. 

2 



, v 1 



(53) 



Using the labeling from Eq. 
tion IV for N = 1, we have 



|) and the results of Sec- 



{(x,x), (x,y)}, 



\i) ® Rx,* m + (I ~ \i) (i\)i®Rx 



(54) 



We use the identity (i\ (j\ P p \i) \j) = S^l/d, n(x,x) = d 
and n(x, y) = d(d — 1), to rewrite the figure of merit in 
Eq. (p2) as 

F = (R x , x ) + (d-l){R x , y ) 

= E (55) 



where A? ^ = i A? y 



0, and A* 



A p ,. Let 

a.o 



i ic,a; a ' ' a.o a,o 

i^[7£,] ._ _ ^| Qi 1 ) |^ = q(i) |^ \/j us now write the normalization conditions for the gen- 



eralized instrument in terms of operators R'ij- We have 
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that that Rn := J^i Ri has to be the Choi-Jamioikowski 
operator of a deterministic quantum network and must 
satisfy Eq. (|l4|), that is 



h®p Tr[p] = l, p>0. (56) 



The commutation relation (|23j) implies [p, U] = and 
consequently the Schur's lemma requires p = \l. We 
take this into account in Eq. ([36|) and with the help of 
Eq. (||) we get 



h ®R x ,x + (d- l)h ® R x , y = -, (57) 



which can be equivalently written as (see Eq.(54)) 

r x,x ~\~ {d~ ^-) r x,y = r x,x + {d — ljTxy — —. 

The above constraint implies the following bound 
I »-" A '.< • id \:r^XJ 

d + 1 



(58) 



^{rl, x + {d-l)rl <y ) = 



d 2 ' 



(59) 



where := max( a m £ l <-^a &■ This bound is achieved by 



1 



r p = - r q 

' x,x j ' x,y 



l 



d' ' x '* d(d-l)' 
which corresponds to a generalized instrument 

J2i = \i){i\i ® ^ p + (/ - |i)<i|)i 8) ^T) P? ' ( 6 °) 
that replicates the original Von Neuman measurement as 



d-1 



(61) 



Based on Eq. (|51j) we conclude that the optimal value of 
D[R] achieved by the aforementioned network is 



D, 



opt 



d 2 - 1 



(1- 



d 2 ' 



(62) 



The optimal learning strategy can be realized by the 
following network 

2 



A! 




(63) 



that operates as follows. The storing part of the strategy 
consists of preparing maximally entangled state 
and measuring one part of it by the unknown measure- 
ment that we want to learn. Application of the learned 



POVM on some system J#2 is achieved by measuring two 
outcome POVM P := {PP,Pi} on the system and 
on the unmeasured part of the state The last 

step of the optimal learming strategy consists in a clas- 
sical processing / of the outcome k of E^ 7 ) and of the 
outcome n of P. The function / that produces the ac- 
tual outcome of the replicated measurement is defined as 
follows 



f(k,n) 



k if n = p 
j 7^ k if n = q 



(64) 



where the outcome j in the second case is randomly gen- 
erated with flat distribution. 

When the outcome n = p of the measurement P oc- 
curs, we achieved a teleportation, of input state of to 
the past, that is to the system J$?2- In this sense the op- 
timal 1 i — y 1 Learning is achieved using the probabilistic 
teleportation [ref!!!].We stress that the optimal scheme 
differs from the one in which one optimally estimates 
E^) and then reproduces the estimated POVM. In con- 
trast to the optimal learning of unitaries, it is possible to 
prove that the optimal estimate & prepare strategy for 
measurements achieves strictly lower performance than 
the strategy derived in this section. 



B. 2 — ?> 1 Learning 

We now consider the case in which we have two uses of 
the unknown Von Neumann measurement at our disposal 

4 



, s 1 

lstfo>= 



2 , s 3 



(65) 



As a consequence of the symmetries introduced in Section 
IV we have 



l = {(x, xx), (x, xy), (x, yx), (x, yy), (x, yz)} 
^=!>'>0'|3®|*><fc|i®iiU 



[Ri,jk,U2®U 2 ®U ] = 



Ri 



z,jk 





if 


i 


= j = k 


^x,xy 


if 


i 


= j^k 


F^x,yx 


if 


i 


= k^j 


Rx,yy 


if 


j 


= k^i 


h Rx,yz 


if 


i 





(66) 
(67) 

(68) 



The figure of merit ( p2[ ) becomes 

F = \ n{a,bc)(R a , bc ). (69) 

(a,6e)el_ 

Let us now consider the normalization condition of the 
optimal generalized instrument 

Ri =h®h® 5*210 



Tr 2 [S] = h g> p Q . (70) 
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Thank to Eq. ( |66| ) we have 
^ |fc) ® P- Jfc = h ® 5210, Vj 



(71) 



Using the property (|32|) we obtain 

h®(k\ S 2W \k) 1 = J2 Kjk = Yl K(i),,UMk) = 

i i 

= /4®((fc|Tt)5 210 (r CT |fc) 1 ) Vj.ft, (72) 
which implies 

^ P^ fc = J 4 ® T 20 Vj, fc Tr 20 [T] = 1. (73) 

i 

The commutation relation ( p3f ) implies [X4 g§ T20, ^4 ® 
t/2 ® Uq] = and by taking the trace on ^4 we get 



T 20 ,U ® U 2 ] = 0, 



(74) 



which due to Schur's Lemmas requires T20 = t + P + + 
t_ P~ . The normalization Tr2o [T] = 1 becomes 

d + t + +d-t- = 1, (75) 

where d± = Tr[P±] and Eq. @ now reads for all j, k 

5>U = h® (t+p + +t-p~) = 

i 

t+(P a <g> |+)(+| + P?) + t_(P a <g> |-)(-| + P?). (76) 

As a consequence of Eq. ( |73] ) the optimal strategy can 
be parallelized. 

4 







3 



(77) 



Eq. ( [77] ) provides a further symmetry of the problem: 



Lemma 6 The operator R[ j/c in Eq. ( pq ) can 6e chosen 
to satisfy: 

R' iijk = SB! i>kj S Vfc,j (78) 
where S is i/ie swap operator S |fc) 2 |j) = |j) 2 |fc) . 

Proof. The proof consists in the standard averaging 
argument, let us define Rijk := j{R'i,jk + SP^^-S). It 
is easy to prove that {Pjjfc} satisfies the normalization 
( [73| ) and that gives the same value of F[R] as P.- fc ■.■ 



Eq. ([78]) together with the decomposition ( pll| ) gives 
for V(o, be) e L 



,c6 



,c6 



,c6 



(79) 



where cr z 



1 

-1 



Considering that n(x, xx) = d, n{x, xy) = n(x, yx) = 
n(x, yy) = d(d — 1), and nix, yz) = d{d — l)(d — 2), and 
that SR x ^ X yS = R x .yx, the figure of merit in Eq. ( j52j ) can 
be written as 

F =(Rx, xx ) + (d-l)(R x , yy ) + 2(d - l)(P^ y ) + 
(d-l)(d-2)(P ;C!j/z ) = 

ETrfA" r" + (d - 1)A U r v + 
L x.xx x.xx 1 \^ J x,yy x,yy 1 



2(d - l)K, xv rl, xy + (d - l)(d - 2)A^ a X, ; 



where 



A" 6c :=Tr^[|ijfe)(ijfc|], 



(80) 



(81) 



and i, jfc is any triple of indices in the class denoted by 
a, be. Notice that in the case d = 2 the last term in the 
sum of Eq. (§(]) is 0. 

The optimization of P[72.] can be carried out in two 
steps: first we maximize P[7?.] for any fixed value of t + 
that satisfies Eq. ([75|); finally we optimize the value of 
t + . The optimization of P[72.] for fixed t + is carried out 
in Appendix [a|. According to Eq. ( A14 ) we can write the 
figure of merit as 



rnr-,! d 2 + 3d J(d- iHltZ d 



(82) 



The last step of the optimization can be easily done by 
making the substitution t— = dZ 1 (l — d+t+) in Eq. ( [82] ) 
and then maximizing F = F(t + ). We will omit the de- 
tails of the derivation and we rather show a plot (Fig. |l|) 
representing the values of D, F depending on the dimen- 
sion. 

Due to Lemma | the replicated POVM has the follow- 
ing form: 



G 



<U) 



dF-l 
d-1 



'' + {i-\)-r 

d 



L/|l>(l|c/t + 7 _ T / ' 



where the values of the coefficient A describing the ran- 
dom mixing of a perfect replica with a trivial measure- 
ment are depicted on Figure ||. 



C. 3 — > 1 learning 

In this section we consider a learning network, which 
exploits 3 uses of the measurement device and produces 
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FIG. 1: Optimal learning of a measurement device: we 
present the values of D, F for different values of the dimen- 
sion d. The squared dots represent the optimal learning from 
a single use (1—5-1 learning) while the round dots and tri- 
angles represent the optimal learning from two uses (2 — 5- 1 
learning) . 




FIG. 2: Optimal learning of a measurement device: we 
present the values of A, the admixture of perfect replica to 
white noise in the produced measurement for different values 
of the dimension d. The squared dots represent the optimal 
learning from a single use (1—5-1 learning) while the dia- 
monds represent the optimal learning from two uses (2 — >• 1 
learning) . 



a single replica: 




1 



(83) 

In order to simplify the problem we restrict ourselves to 
the qubit case, that is we set d — 2. The derivation of the 
optimal learning network turns out to be very involved 
although it follows the same lines as for the 2 — > 1 case. 
We made the calculations analytically with the help of a 
symbolic mathematical program. 

The 3—7-1 scenario deserves interest because the opti- 
mal solution does not allow a strategy having the 3 uses 



of the measurement device in parallel. In other words the 
optimal strategy needs to be adaptive. 

Let us consider the normalization condition for the 
generalized instrument {Ri}: 



It 



65 



s. 



43210 



^2 \j kl )U kl \531 ® Ri,jkl 
ijkl 

TulS]=I 3 ®T 2W (84) 
This implies 

^ Rijki = h ® (kl\ 543210 \kl} 31 Vj, 

i 

(M\Tr4S]\kl) = {l\T\l} 1 Vfc. (85) 

From the relabeling symmetry R ijkl = R,y(i),a(j)a(k)a(i) 
we have (kl\S\kl) = (a(k)a(l)\ S \a(k)a(l)) , and conse- 
quently 



(kl\ Tr 4 [S] \kl) 31 = ^Tr 43 i[S] =: T 20 



This fact along with Eq. (£ 



Wk,l. (86) 
allows us to conclude that 



Tr 4 [5] =Tr 4 ^ \kl)(kl\ 31 <g> (kl\ S43210 \kl) 
. kl 

Y,\M} (kl\si ®f 20 = I 31 ®Ti 



1 20 



(87) 



which means that the first two uses can be in parallel. We 
notice that in general (kl\ S \kl) = (a(k)a(l) \ S \a(k)a(l)} 
does not imply that (kl\ S \kl) = S is independent of k, I, 
but only that (kl\ S \kl) = S a b, where a, b denotes the 
equivalence class of the couple (k,l). Consequently, we 
cannot in general assume that all the examples can be 
used in parallel. In fact, the optimal learning network 
has the following causal structure 





n 






6 






2 


, v 3 

E(^))= 




4 . v 5 

— E^))= 





(88) 



where the state of system 4 depends on the classical out- 
come in system 3 and 1. The optimal value of F[7?.] is 
approximately 0, 87 (we remind that for the 1 — > 1 learn- 
ing we had F — 0, 75, while for the 2 — s- 1 case we had 
F = 0, 81). The corresponding value of coefficient A (see 
Eqs.©,©) are depicted on Fig. (§). 

Remark 1 One can wonder whether without assuming 
any symmetry it is possible to find a non-symmetric par- 
allel strategy {Ri} that achieves the optimal value of 
F[R]. However we remind that for any strategy {Ri} 
we can build a symmetric one with the same normaliza- 
tion, that is without spoiling the parallelism, and giving 
the same fidelity. Since the optimal symmetric network 
cannot be parallel, we have that any other optimal net- 
work has to be sequential as well. 



11 



VI. CONCLUSIONS 



Appendix A: Calculations for 2 — > 1 Learning 



Wc analyzed optimal learning of a measurement de- 
vice. Our approach to the problem is based on the for- 
malism of quantum combs and generalized quantum in- 
struments, introduced in Refs. |p[-|lO|. The original prob- 
lem can be significantly simplified by utilizing the sym- 
metries provided by the figure of merit. In particular, 
covariance and relabeling symmetry allow us to signifi- 
cantly decrease the number of parameters, without affect- 
ing the figure of merit. As a consequence of the symme- 
try of the learning network the replicated measurement 
can be seen as a random mixture of a perfect replica 
of the measurement device to be learnt with weight A 
and of a trivial measurement producing all possible out- 
comes with the same probability independently of the 
input state with weight 1 — A. For 2 — s- 1 and 3—^1 
learning the first two uses of the unknown measurement 
device can be parallelized, and and this result can be 
generalized to N — > 1 learning. However, the optimal 
learning algorithm cannot be further parallelized, namely 
the examples exceeding the second one must be used se- 
quentially. This feature is very unusual, and it occurs 
in few cases of quantum algorithms Jl3| , |l4| ]. For exam- 
ple, while the quantum part of Shor's algorithm can be 
parallelized, Grover's algorithm cannot, as was proved in 
Ref. [ fl5| . Our results prove that quantum learning of 
a von Neumann measurement shares with Grover's algo- 
rithm the impossibility of parallelizing without affecting 
optimality. The parallelization of the first two examples 
from this point of view is a curious exception. 

An obvious extension of the work would be to study the 
scaling of the performance of the optimal learning strat- 
egy with respect to N. However, our results show that 
optimal learning networks with different N do not share 
the same the initial steps. This means that the optimiza- 
tion of N — > 1 learning can not be done inductively build- 
ing on the results from N — 1 — > 1 case. The complexity 
of the optimization in general case rises mainly due to 
the causal influence of steps of the learning strategy on 
the remaining part of the network, which is reflected in 
the recursive structure of the normalization constraints. 
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The explicit expression of A„ 6c in Eq. (|8l]) is given by 

Ti °\ a« = \\ 3+1 y/d?-i 1 
q) ■ 2 ^ ) ' 

A a _ AOi n a a _ a a 

x,yy ^x,yz u ? ^x.yx ° z^x,xy a Z 

A/ 3 = d ~ 1 A f3 = A 13 = - 

^x,xx d+1 , ^x.xy ^x,yx 2 (d + 1) ' 

A^ =1, A" =-. 

x,yy x. : yz ^ 

A 7 = A 7 = 0, A 7 = A 7 = d ~ 2 , 

x,xx x,yy ' x,xy x,xy 2(ti l) 



A 7 = - 

x,yz 2 

Introducing the notation 



S x,yx - — ^) r x,yx 

:= (d-2)(d-l)r: !yz , 
the figure of merit (^0|) becomes 
F = F a +F p +F 1 



(Al) 



s v — (d - Y)r v 

^x,xy ' V J'x,xy 

„„ . = {d — ^)f x y y 



(A2) 



E 

(a,fec)SL 



Tr[A£ bc <6cL ^e{a,/?7l (A3) 



We express R'ijk through R a ,bc (a, be) £ L and Equation 
( |39| ) . Depending on j = k or j ^ k Eq. ([Z6|) is equivalent 
to the following relations 

3 = k 

a , a = (t+ 
x,xx 1 x,yy I Q £ 

x,xx 1 x.yy ~r 



x,xx 1 ^x,yy 



= t- 



3 



(d-l)t+ 
(d-l)t- 



2s P x ,xy + 4,y Z = {d-l)t + 

^Z, xy + sZ tVZ = (d-i)t., 



(A4) 
(A5) 

where we utilized Equation (^9|) implied by Lemma |]. 
We now derive the optimal learning network for a fixed 
value of t+ (remember that t- = (1 — d+t+)/ d-). 

First we maximize Fp and F 7 for the case d > 3. Using 
the expressions for the A^. fe from Eq. ( |Al| ) we have: 

F l3 = nK b c< bc ]<^(<xx,Kyy) t ++ 

(a,bc)GL 

+ max(A^,Af jy J(d- l)t+ = 
= &x,yy t + + A x,yz(d-l)t+ = 
(d-l)f+ _ (d+l)i+ 



= t+ + 



(A6) 
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and 



F,= £ Tr[A^ cS I fcc ]<max(A^,AX ra )t_ 

(a,bc)£L 

+ max(A 7 xy , A 7 y2 )(d - l)t_ = 
= A3. w (d-l) t _ = ^k. 



(A7) 



where we used the norma liza tions constraints (A4). The 
upper bounds ( |A6| ) and (A7) can be achieved by taking 



X,XX 


°x,xy 


= < 


— s 7 — s 7 — s 7 — 

x,a:a; x,xy x,yx ' 


x,yy 


= *+, 




= (d-l)t+, 


s 7 


= *-, 


S 7 

a x,yz 


= (d- 1)<_. 



For d = 2 the irreducible representation denoted by 7 and 
the x,yz class do not exist and the optimization yields 



t+(d-l) 



^x,xy 

Let us now consider F a (in this case there is no differ- 
ence between d > 3 and d — 2). Based on the expression 
of A" - fc we have: 

F Q = £ Tr[A£ 6cS « 6c ] = 

(a,bc)£L 



Tr 



i+l I s 

rj x ' xx 



-t+ + Tr 



1 

d+l 
1 



a a 1 
x,xy x,xy\ 


+ Tr[/\ 


a a 
i x,yx x 




( ^T 1 


Vd 5 -1 


\s° 


< 




d-1 j 






1 d-1 


V ' 

J u x,xy 




(A8) 



and the bound can be achieved by taking 



t+ 
i_ 



(A9) 



Let us now focus on the expression TrfA^ s" ]. The 
normalization constraint (A4) for the operator s xxy can 
be rewritten as: 



^x,yz x,yz J 

st± + + 2s?'i:+ = (d - i)t+ 



x,xy 

OQ a >->- 

^x,xy 



(d-l)i-, 



(A10) 



where we denoted s 



•|'-a;;., h <: 



a,±,± . 



(±|s« 6c |±). Then we have 



s«> + >- 



x,xy 



xyi d _ 



1 v 7 ^ 

x,xy 



1 



d-1 



< 



„«,+,+ 

'x,XJ/ 



«, + ,+ «,-,- 
OX.Xll <J.T 



Vd 5 



< 



(All) 



(d-l)t+ ^(d- l)t + t_ f 



— (A12) 

2(d+l) VdTT 2 V ; 

where we used the positivity of the operator s™ xy for 
the inequality ( All|) an d the normalization (A10) f or the 
second inequality ( |A12| ) . The upper bound in Eq. ([A12]) 
can be achieved by taking 



(d-1) 



x,xy 



Vt+tz t- 



(A13) 



Finally, combining the optimal values of F a , Fp, and Fj 
we have 



F[K] 



3d 



2(d + l) 



y/(d-l)t+t- 



d 

2 



(A14) 



[1] V. Vapnik, Statistical Learning Theory, Wiley- 

Interscience (1989), ISBN 0-471-03003-1. 
[2] M. Nielsen and I. Chuang, Phys. Rev. Lett. 79, 321324 

(1997) 

[3] M. Sasaki, A. Carlini, and R. Jozsa, Phys. Rev. A 64, 
022317 (2001). 

[4] M. Sasaki and A. Carlini, Phys. Rev. A 66, 22303 (2002). 
[5] S Gammelmark and K M0lmer, New J. Phys. 11, p. 3017 
(2009). 

[6] L. K. Grover Proceedings of the 28th Annual ACM Sym- 
posium on the Theory of Computing, (1996) 

[7] A. Bisio, G. Chiribella, G. M. D'Ariano, S. Facchini, and 
P. Perinotti, Phys. Rev. A 81, 32324 (2010). 

[8] G. Chiribella, G. M. D'Ariano, and P. Perinotti, Phys. 
Rev. A 80, 022339 (2009) 

[9] G. Chiribella, G. M. D'Ariano, and P. Perinotti, Phys. 



Rev. Lett. 101, 060401 (2008) 

[10] G. Chiribella, G. M. D'Ariano, and P. Perinotti, Phys. 
Rev. Lett. 101, 180501 (2008). 

[11] This is equivalent to a usage of a direct sum over the 
classical outcomes. 

[12] For N + 1 > d, this number is known as Bell number 
Bjv+i.In the case N + 1 < d the solution is provided by 
the sum for k = 1, . . . , d of numbers of disjoint partitions 
of a set with N + 1 elements into k subsets, which is the 
sum of Stirling numbers of the second kind S(N + l.k) 
for 1 < k < d. 

[13] A. W. Harrow, A. Hassidim, D. W. Leung, J. Watrous 

Phys. Rev. A 81, 032339 (2010) 
[14] J. Fiurasek, M. Micuda Phys. Rev. A 80 042312 (2009). 
[15] C. Zalka, Phys. Rev. A 60, p. 2746 (1999). 



